Purpose: In laparoscopic liver surgery (LLS), pre-operative information can be overlaid onto the intra-operative scene by registering a 3D pre-operative model to the intra-operative partial surface reconstructed from the laparoscopic video. To assist with this task, we explore the use of learning-based feature descriptors, which, to our best knowledge, have not been explored for use in laparoscopic liver registration. Furthermore, a dataset to train and evaluate the use of learning-based descriptors does not exist. Methods: We present the LiverMatch dataset consisting of 16 preoperative models and their simulated intra-operative 3D surfaces. We also propose the LiverMatch network designed for this task, which outputs per-point feature descriptors, visibility scores, and matched points. Results: We compare the proposed LiverMatch network with anetwork closest to LiverMatch, and a histogram-based 3D descriptor on the testing split of the LiverMatch dataset, which includes two unseen pre-operative models and 1400 intra-operative surfaces. Results suggest that our LiverMatch network can predict more accurate and dense matches than the other two methods and can be seamlessly integrated with a RANSAC-ICP-based registration algorithm to achieve an accurate initial alignment. Conclusion: The use of learning-based feature descriptors in LLR is promising, as it can help achieve an accurate initial rigid alignment, which, in turn, serves as an initialization for subsequent non-rigid registration. We will release the dataset and code upon acceptance.
translated by 谷歌翻译
Machine learning models have been found to learn shortcuts -- unintended decision rules that are unable to generalize -- undermining models' reliability. Previous works address this problem under the tenuous assumption that only a single shortcut exists in the training data. Real-world images are rife with multiple visual cues from background to texture. Key to advancing the reliability of vision systems is understanding whether existing methods can overcome multiple shortcuts or struggle in a Whac-A-Mole game, i.e., where mitigating one shortcut amplifies reliance on others. To address this shortcoming, we propose two benchmarks: 1) UrbanCars, a dataset with precisely controlled spurious cues, and 2) ImageNet-W, an evaluation set based on ImageNet for watermark, a shortcut we discovered affects nearly every modern vision model. Along with texture and background, ImageNet-W allows us to study multiple shortcuts emerging from training on natural images. We find computer vision models, including large foundation models -- regardless of training set, architecture, and supervision -- struggle when multiple shortcuts are present. Even methods explicitly designed to combat shortcuts struggle in a Whac-A-Mole dilemma. To tackle this challenge, we propose Last Layer Ensemble, a simple-yet-effective method to mitigate multiple shortcuts without Whac-A-Mole behavior. Our results surface multi-shortcut mitigation as an overlooked challenge critical to advancing the reliability of vision systems. The datasets and code are released: https://github.com/facebookresearch/Whac-A-Mole.git.
translated by 谷歌翻译
Developing robust and fair AI systems require datasets with comprehensive set of labels that can help ensure the validity and legitimacy of relevant measurements. Recent efforts, therefore, focus on collecting person-related datasets that have carefully selected labels, including sensitive characteristics, and consent forms in place to use those attributes for model testing and development. Responsible data collection involves several stages, including but not limited to determining use-case scenarios, selecting categories (annotations) such that the data are fit for the purpose of measuring algorithmic bias for subgroups and most importantly ensure that the selected categories/subcategories are robust to regional diversities and inclusive of as many subgroups as possible. Meta, in a continuation of our efforts to measure AI algorithmic bias and robustness (https://ai.facebook.com/blog/shedding-light-on-fairness-in-ai-with-a-new-data-set), is working on collecting a large consent-driven dataset with a comprehensive list of categories. This paper describes our proposed design of such categories and subcategories for Casual Conversations v2.
translated by 谷歌翻译
大量的时间序列数据通常被组织成具有不同聚集水平的横截面结构。示例包括产品和地理组。与此类数据集相干决策和计划的必要条件是针对分散的系列的预测,可以准确地添加到汇总的系列预测中,这激发了新型层次结构预测算法的创建。机器学习社区对横截面层次预测系统的兴趣日益增长,我们正处于一个有利的时刻,以确保科学的努力基于声音基线。因此,我们提出了层次Forecast库,该库包含预处理的公开可用数据集,评估指标和一组编译的统计基线模型。我们基于Python的框架旨在弥合统计,计量经济学建模和机器学习预测研究之间的差距。代码和文档可在https://github.com/nixtla/hierarchicalforecast中找到。
translated by 谷歌翻译
环境场景的重建对于自动机器人应用引起了极大的兴趣,因为必须准确表示环境以确保与机器人的安全互动。同样重要的是,确保机器人与其控制器之间的可靠通信也至关重要。大型智能表面(LIS)是一项由于其通信能力而被广泛研究的技术。此外,由于天线元件的数量,这些表面是无线电传感的有力解决方案。本文提出了一种新颖的方法,可以将LIS在其区域散布的散射器建造的室内环境中获得的无线电环境图转换为室内环境的平面图。利用了基于最小二乘(LS)的方法,U-NET(UN)和条件生成对抗网络(CGAN)来执行此任务。我们表明,可以使用本地和全球测量值正确重建平面图。
translated by 谷歌翻译
背景:精确分割显微镜结构,例如显微镜成像中的生物胶囊,是计算机辅助理解重要生物力学现象的先决条件。最新的细分性能是通过深层神经网络和相关数据驱动方法实现的。从只有几个带注释的示例中训练这些网络是具有挑战性的,同时制作提供监督的手动注释图像是乏味的。方法:最近,自我划分,即设计提供合成或间接监督的神经管道,已被证明可以显着提高经过几次镜头训练的模型的概括性能。本文的目的是在微胶囊图像分割的背景下引入一个这样的神经管道。我们的方法利用了这些图像的简单内容,以便可以通过裁判网络对学员网络进行指导,该裁判网络以前曾在合成生成的损坏/正确的区域口罩上进行过培训。结果:研究了具有挑战性的实验设置。它们仅涉及3到10个带注释的图像以及中等程度的未注释的图像。在生物人工胶囊数据集中,我们的方法始终如一地提高了准确性。我们还表明,学识渊博的裁判网络可转移到另一个胶质母细胞瘤细胞数据集,并且可以有效地与数据增强策略相结合。结论:实验结果表明,提出的管道可以获得非常明显的准确性增量,从而得出结论,即本文中引入的自学机制具有替代人类注释的潜力。
translated by 谷歌翻译
我们描述了一个轻巧但性能的系统,用于高参数优化,该系统近似可最大程度地减少通过使用目标优先级标量标量的多重性能目标获得的总体标量成本函数。它还支持权衡模式,目标是通过与用户互动来找到目标之间的适当权衡。我们关注的是在数十个超参数的顺序上,每个方案都具有各种属性,例如一系列连续值或有限的值列表,以及是否应在线性或对数刻度上进行处理。该系统支持多个异步模拟,并且对模拟散乱者和故障具有鲁棒性。
translated by 谷歌翻译
我们提出了Locommer,一种基于变压器的视频接地模型,其在恒定的存储空间中运行,无论视频长度如何,即帧数。 Locommer专为任务而设计,在那里需要处理整个长视频,并在其核心贴上两个主要贡献。首先,我们的模型包含一种新的采样技术,将输入要素序列分成固定数量的部分,并使用随机方法选择每个部分的单个特征,这允许我们获得代表视频内容的特征样本集在手中的任务,同时保持内存占用空间。其次,我们提出了一种模块化设计,将功能分开,使我们能够通过监督自我关注头来学习归纳偏差,同时还有效利用预先接受训练的文本和视频编码器。我们在相关的基准数据集中测试我们的建议,以进行视频接地,表明该表现形式不仅可以实现优异的结果,包括在YouCookii上的最先进的性能,也可以比竞争对手更有效,并且它一直有效在平均工作的情况下,最新工作的表现,均值较大,最终导致Chardes-STA的新的最先进的性能。
translated by 谷歌翻译
古典学习设置是学生收集数据或观察到系统,并估计其对其的一定数量的景点。惩教学习是一种合作教师学生框架,一名关于系统知识的老师有可能观察和改变(正确)学生接受的观察,以改善其估算。在本文中,我们展示了在教师的帮助下减少了学生估计的差异。我们进一步制定了在线问题 - 教师必须在每次即时决定是否改变观察 - 作为马尔可夫决策过程,从中使用动态编程来源的最佳策略。我们在数值实验中验证框架,并将最佳的在线策略与批处理设置中的最佳在线策略进行比较。
translated by 谷歌翻译
本文提出了一个实时的在线视觉框架,共同恢复室内场景的3D结构和语义标签。给定嘈杂的深度地图,相机轨迹和火车时间的2D语义标签,所提出的深度神经网络的方法学会融合在场景空间中具有合适的语义标签的框架。我们的方法利用现场特征空间中深度和语义的联合体积表示来解决此任务。对于实时语义标签和几何形状的引人注目的在线融合,我们介绍了一个高效的涡流池块,同时删除了在线深度融合中的路由网络,以保持高频表面细节。我们表明场景的语义提供的上下文信息有助于深度融合网络学习抗噪声功能。不仅如此,它有助于克服当前在线深度融合方法的缺点,在处理薄物体结构,增厚伪像和假表面。 Replica DataSet上的实验评估表明,我们的方法可以在每秒37和10帧中执行深度融合,平均重建F分数分别为88%和91%,具体取决于深度图分辨率。此外,我们的模型在Scannet 3D语义基准排行榜上显示了0.515的平均iou得分。
translated by 谷歌翻译